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Abstract — This paper addresses the problems of relabeling 
and summarizing posterior distributions that typically arise, in 
a Bayesian framework, when dealing with signal decomposi- 
tion problems with an unknown number of components. Such 
posterior distributions are defined over union of subspaces of 
differing dimensionality and can be sampled from using modern 
Monte Carlo techniques, for instance the increasingly popular 
RJ-MCMC method. No generic approach is available, however, 
to summarize the resulting variable-dimensional samples and 
extract from them component-specific parameters. 

We propose a novel approach, named Variable-dimensional 
Approximate Posterior for Relabeling and Summarizing (VAPoRS), 
to this problem, which consists in approximating the poste- 
rior distribution of interest by a "simple" — but still variable- 
dimensional — parametric distribution. The distance between the 
two distributions is measured using the Kullback-Leibler di- 
vergence, and a Stochastic EM-type algorithm, driven by the 
RJ-MCMC sampler, is proposed to estimate the parameters. 
Two signal decomposition problems are considered, to show the 
capability of VAPoRS both for relabeling and for summarizing 
variable dimensional posterior distributions: the classical prob- 
lem of detecting and estimating sinusoids in white Gaussian noise 
on the one hand, and a particle counting problem motivated by 
the Pierre Auger project in astrophysics on the other hand. 

Index Terms — Bayesian inference; Signal decomposition; 
Trans-dimensional MCMC; Label-switching; Stochastic EM. 



I. Introduction 

Nowadays, owing to the advent of Markov Chain Monte 
Carlo (MCMC) sampling methods [2-5], Bayesian data anal- 
ysis is considered as a conventional approach in machine learn- 
ing, signal and image processing, and data mining problems — 
to name but a few. Nevertheless, in many applications, prac- 
tical challenges remain in the process of extracting, from the 
generated samples, quantities of interest to summarize the 
posterior distribution. 

Summarization consists, loosely speaking, in providing a 
few simple yet interpretable parameters and/or graphics to 
the end-user of a statistical method. For instance, in the case 
of a scalar parameter with a unimodal posterior distribution, 
measures of location and dispersion (e.g., the empirical mean 
and the standard deviation, or the median and the interquar- 
tile range) are typically provided in addition to a graphical 
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summary of the distribution (e.g., a histogram or a kernel 
density estimate). In the case of multimodal distributions, 
summarization becomes more difficult but can be carried 
out using, for instance, the approximation of the posterior 
by a Gaussian Mixture Model (GMM) [6]. Summarizing or 
approximating posterior distributions has also been used in 
designing proposal distributions of Metropolis-Hastings (MH) 
samplers in an adaptive MCMC framework; see, e.g., [7-9]. 

This paper addresses the problem of summarizing posterior 
distributions in the case of some trans-dimensional problems 
(i.e., "problems in which the number of things that we don't 
know is one of the things that we don't know" [10, 11]). More 
specifically, we concentrate on the problem of signal decom- 
position when the number of components is unknown, which 
is an important case of trans-dimensional problem. Examples 
of such problems include the detection and estimation of 
sinusoids in white Gaussian noise [12] and the related problem 
of estimating directions of arrival in array processing [13], the 
detection of objects in images [14, 15], and the detection of 
physical particles (neutrons, muons, . . . ) using noisy data from 
various types of sensors, for instance in spectroscopy [16] or 
astrophysics [17, 18]. 

Let y = (yi,}'2) •••>)'iv)' be a vector of N observations, 
where the superscript t stands for vector transposition. In 
signal decomposition problems, the model space is a finite 
or countable set of models, . // = {^k, k G J^}, where Ic 
denotes the number of components and Jtf C N the set of its 
possible values. It is assumed here that, under there are 
k components with vectors of component-specific parameters 
01* = ($!,..., 6 k ) G 0*, where © C W 1 and ©° = {0}. One 
feature that the problems we are considering have in common 
is the invariance of the likelihood p (y | k, 0i*) Wim respect to 
permutations (relabeling) of the components, which is called 
the "label-switching" issue in the literature; see, e.g., [19-23]. 
We will discuss this issue further in Section I-A. 

In a Bayesian framework, a joint posterior density 
f (k, 0i*) = p{k, 0i* |y) is obtained through Bayes' formula 
for the number k of components and the vector of component- 
specific parameters, after assigning prior distributions on them: 



/(Mi*) - p(y\k,e hk )p{e hk \k)p(k), 



(i) 



where indicates proportionality. This density is defined over 
a variable-dimensional space ©, which is a union of subspaces 
of differing dimensionality, i.e., = U,t>o{£} X & k . 

The posterior density (1) completely describes the informa- 
tion (and the associated uncertainty) provided by the data y 
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about the candidate models and the vector of unknown pa- 
rameters. Since it is only known up to a normalizing constant 
in most cases, Monte Carlo simulation methods, such as the 
Reversible Jump MCMC (RJ-MCMC) sampler [10], have been 
widely used to approximate it. 

A. The label-switching issue 

One of the most challenging issues when attempting at 
summarizing posterior distributions, that even occurs in fixed- 
dimensional situations, is the label-switching phenomenon 
(see, e.g., [19-23]), which is caused by the invariance of both 
the likelihood and the prior distribution under permutations 
of the components. As a consequence, the component-specific 
marginal posterior distributions are all equal, and therefore 
useless for the purpose of summarizing the information con- 
tained in the posterior distribution about individual compo- 
nents. 

The simplest way of dealing with the label-switching issue 
is to introduce an Identifiability Constraint (IC), such as 
sorting the components with respect to one of their parameters; 
see [19] for more discussion concerning the use of ICs in 
the problem of Bayesian analysis of GMM. However, in most 
practical examples, choosing an appropriate IC manually is 
not feasible. Many relabeling algorithms have therefore been 
developed to "undo" the label-switching effect automatically, 
but all of them are restricted to the case of ^zxec/-dimensional 
posterior distributions; see [23-25] for recent advances and 
references. 

In variable-dimensional posterior distributions, there is an 
extra uncertainty about the "presence" of components, in ad- 
dition to their location. This challenging problem has hindered 
previous attempts to undo label-switching in the variable- 
dimensional scenario, where, according to [26] "the meaning 
of individual components is vacuous". This argument will be 
clarified in the following illustrative example. 

B. Illustrative example: joint Bayesian detection and estima- 
tion of sinusoids in white Gaussian noise 

In this example, it is assumed that under ^ k , the observed 
signal y is composed of k sinusoidal components observed in 
white Gaussian noise. That is, under Jt k , 

k 

y[i] = ( a c.jcos(c0ji) + a s jsm((Oji)) + «[;'], 

7=1 

where a c j and a s j are the cosine and sine amplitudes, and 
(Oj is the radial frequency of the / h sinusoidal component. 
Moreover, n is a white Gaussian noise of variance a 2 . 

The unknown parameters are the number k of sinusoidal 
components, the vectors 9j = (a c j,a s j,C0j) of component- 
specific parameters, 1 < j < k, and the noise variance a 2 . 
Thus, © = R 2 x (0, n) and = (u k > {k} x ©*) Ul+. We use 
the hierarchical model, prior distributions, and the RJ-MCMC 
sampler proposed in [12] for this problem; the interested reader 
is thus referred to [10, 12] for more details 1 . 

'in fact, the "Birth-or-Death" moves' acceptance ratio provided in the 
seminal paper [12] is erroneous. See [1, Chapter 1] or [27] for justification 
and true expression of the acceptance ratio, which is used in this paper. 
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Figure 1 : Posterior distributions of k (left) and sorted radial frequen- 
cies, <B\ :k , given k (right) from 100 000 output RJ-MCMC samples. 
The true number of components is three. The vertical dashed lines 
in the right figure locate the true radial frequencies. 



Figure 1 represents the posterior distributions of both the 
number k of components and the sorted radial frequen- 
cies tO\± = (»i , . . . , COkf given k obtained using 100000 sam- 
ples generated by the RJ-MCMC sampler. Note that, here, we 
used sorting to mitigate the effect of label-switching for visual- 
ization. Each row is dedicated to one value of k, for 2 < k < 4. 
Observe that other models have negligible posterior probabili- 
ties, since p(2 < k < 4 | y) = 0.981. In the experiment, the ob- 
served signal of length N = 64 consists of three sinusoids with 
energies A i:k = (20,6.32,20)', where Aj = a 2 . j + a 2 j7 phases 
= (0, 7T/4, 7t/3)', where <pj = — arctan(a JJ /fl tJ ), and 
true radial frequencies m hk = (0.63,0.68,0.73)'. The SNR = 
||Dai :J t|| 2 / (Na 2 ), where a hk = (a c . i, a s ,i, . . . , a C}k , a^ k )' and 
D is the Nx2k "design matrix" of sines and cosines associated 
to fi>i : fc, is set to the moderate value of 7dB. 

Roughly speaking, two approaches co-exist in the literature 
for summarizing variable-dimensional posterior distributions: 
Bayesian Model Selection (BMS) and Bayesian Model Aver- 
aging (BMA). The BMS approach ranks models according to 
their posterior probabilities p(k | y), selects one model, denoted 
by k MAP here, where MAP stands for Maximum A Poste- 
riori, and then summarizes the posterior distribution of the 
component-specific parameters under the (fixed-dimensional) 
selected model. This is at the price of losing valuable informa- 
tion provided by the other (discarded) models. For instance, 
in the example of Figure 1, all information about the small — 
and therefore harder to detect — middle component is lost by 
selecting the most a posteriori probable model On the 
other hand, the BMA approach consists in reporting results 
that are averaged over all possible models. Although the BMA 
approach is suitable for signal reconstruction and prediction 
purposes (see, e.g., [28] and references therein), it is not 
appropriate for studying component-specific parameters, the 
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number of which changes in each model 2 . More information 
concerning these two approaches can be found in [10, 28] and 
references therein. 

To the best of our knowledge, no generic method is currently 
available that would allow to summarize the information that 
is so easily read on Figure 1 for this very simple example: 
namely, that there seem to be three sinusoidal components in 
the observed noisy signal, the middle one having a smaller 
"probability of presence" than the others. 

C. Outline of the paper 

In this paper, we propose a novel approach, named Variable- 
dimensional Approximate Posterior for Relabeling and Sum- 
marizing (VAPoRS), for relabeling and summarizing posterior 
distributions defined over variable-dimensional subspaces that 
typically arise in signal decomposition problems when the 
number of components is unknown. It consists in approxi- 
mating the true posterior distribution with a parametric model 
(of varying-dimensionality), by minimization of the Kullback- 
Leibler (KL) divergence between the two distributions. A 
Stochastic Expectation Maximization (SEM)-type algorithm 
[29-31], driven by the output of an RJ-MCMC sampler, is 
used to estimate the parameters of the approximate model. 

VAPoRS shares some similarities with the relabeling algo- 
rithms proposed in [20, 24, 25] to solve the label switching 
problem, and also with the EM-type algorithm used in [8] in 
the context of adaptive MCMC algorithms (both in a fixed- 
dimensional setting). The main contribution of this paper is 
the introduction of an original variable-dimensional parametric 
model, which allows to tackle directly the difficult problem of 
approximating a distribution defined over a union of subspaces 
of differing dimensionality — and thus provides a first solution 
to the "trans-dimensional label-switching" problem, so to 
speak. 

Perhaps, the algorithm that we propose can be seen as a 
realization of the idea that M. Stephens had in mind when he 
stated [32, page 94]: 

"This raises the question of whether we might be able to 
obtain an alternative view of the [variable-dimensional] poste- 
rior by combining the results for all different k's, and grouping 
together components which are "similar", in that they have 
similar predictive density estimates. However, attempts to do 
this have failed to produce an easily interpretable results." 

The paper is organized as follows. Section II introduces 
the proposed model and stochastic algorithm for relabeling 
and summarizing variable dimensional posterior distributions. 
Section III illustrates the performance of VAPoRS using two 
signal decomposition examples, namely, the problem of joint 
Bayesian detection and estimation of sinusoids in white Gaus- 
sian noise and the problem of joint Bayesian detection and 
estimation of particles in the Auger project (in astrophysics). 
Section IV confirms the performances of VAPoRS using a 
Monte Carlo experiment. Finally, Section V concludes the 
paper and gives directions for future work. 

2 See, however, the intensity plot provided in Section III (Figure 7) as an 
example of a BMA summary related to a component-specific parameter. 



II. VAPoRS 

We assume that the target posterior distribution, defined on 
the variable-dimensional space = Uiejr W x ®*> admits a 
probability density function (pdf) / with respect to the kd- 
dimensional Lebesgue measure on each {k} x ® k , k e X. 

Our objective is to approximate the true posterior density / 
using a "simple" parametric model q^, where 77 is the vector 
of parameters defining the model. The pdf q^ will also be 
defined on the variable-dimensional space (i.e., it is not 
a fixed-dimensional approximation as in the BMS approach). 
We assume that a Monte Carlo sampling method — e.g., an 
RJ-MCMC sampler [10] — is available to generate M sam- 
ples from /, which we denote by 6® = for 
i=l,...,M. 

A. Variable-dimensional parametric model 

Instead of trying to describe the proposed parametric family 
of densities {q^} directly, let us now adopt a generative 
point of view, i.e., let us describe how to sample an 0- 
valued random variable 6 — (k,d\±) from the corresponding 
probability distribution. We assume that a positive integer L is 
given, which represents the number of "components" present 
in the posterior. 

First we generate, independently for each of the L com- 
ponents, a binary indicator variable |; € {0,1} drawn from 
the Bernoulli distribution SSer{%i), where ^ = 1 indicates 
that the corresponding component is present (otherwise, it is 
absent) in 6. The actual number k of components in the gen- 
erated samples is thus defined as k = Yji=\£,i- The parameter 
%i € (0, 1] will be called the "probability of presence" of the I th 
component. 

Second, given the vector of indicator variables % = 
(£1, ... , <^l), a ©-valued random vector is generated for each 
component that is present (i.e., for each I such that E,[ = 1). 
This random vector is generated according to some probability 
distribution associated to the component, that will be assumed 
to be a ©-truncated J-dimensional Gaussian distribution with 
mean fl t and co variance matrix E; in this paper 3 . In order 
to achieve the required invariance with respect to component 
relabeling, the generated vectors are randomly 4 arranged in a 
vector e hk = (6i,...,6 k ). 

Contemplating the posterior distributions of the sorted radial 
frequencies depicted in the right panel of Figure 1, particularly 
the plots related to the models with three and four sinusoidal 
components, it can be observed that there are "diffuse parts" 
in the RJ-MCMC output samples resulting in the heavy asym- 
metric tails of some components. It is clear that a model made 
of Gaussian components only is not capable of describing 
these diffuse samples, at least not in a parsimonious way. 
These abnormal observations, with respect to the bulk of the 
observed data, or, simply outliers, can adversely influence the 

3 Note that any d-dimensional parametric family of distributions could be 
used at this point. As often in the literature [8, 20, 24, 25], the Gaussian 
distribution is chosen here as a convenient mean of describing a "compact" 
and unimodal rf-dimensional distribution, nothing more. 

4 More precisely, a permutation of the k components that are present is 
drawn uniformly in the set of all permutations. 
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Figure 2: Proposed variable-dimensional parametric model in a 
generative viewpoint. 

process of fitting the approximate posterior to the true posterior 
distribution of interest and consequently lead to meaningless 
parameter estimates. 

To overcome this robustness issue, we propose to include in 
the model a "noise-like" Poisson Point Process (PPP; see, e.g., 
[33]) to account for the presence of outliers in the observed 
samples. We assume that the PPP is homogeneous 5 on ©, 
with intensity A/|0|. The number of components generated 
by the PPP thus follows a Poisson distribution with mean X. 
To be consistent with our previous notations, we denote by 
G N this number; note that , . . . , %l still take their values 
in {0, 1}. The (extended) vector % thus follows the probability 
distribution 

<;l+i! y 

Given <i; L+1 random samples are generated uniformly 
on ® and inserted randomly among the samples drawn from 
the Gaussian components. We denote by qrj the pdf of the 
random variable 6 = (k,&i±) that is thus generated, with 

1? = and 1\i = {V-i^h^l)- Fi g ure 2 provides 

the directed acyclic graph of the model. 

B. Distribution of the labeled samples 

A random variable 8 = (k, (fli, ■ ■ ■ , 8^)) drawn from the 
density q^ can be thought of as an "unlabeled sample", since 
the label /G«5?={1,...,L+1} of the component from 
which each 6 j (1 < j < k) originates cannot be recovered 
from 6 itself. Let us now introduce the (variable-dimensional) 
allocation vector 

z = (*,(zi,-,z*)) G U {k}x^ k , 

keJtr 

which provides the missing piece of information: Zj = I 
indicates that 9j originates from the I th (Gaussian) component 
if / < L, while z } ■— L + 1 indicates that 8j originates from the 
point process component. We will refer to the pair (8,z) as 
a labeled sample. In the following, we will derive its joint 
distribution qrj(8,z) = q^(8 |z)g,}(z). 

'Homogeneity is assumed here for the sake of simplicity, but more elaborate 
(non-homogeneous) models are easily accommodated by our approach, if 
needed. 



The distribution of the allocation vector z is 

9ij( z ) = 9tj( z I £)tfri(£), (3) 

where ^tj(^) is given in (2). Note that % is a deterministic 
function of z: % = n(z), with n/(z) = Dy =1 l z -=/, for 1 < / < 
L + 1 . To compute the first term of (3), remember that the 
points generated by the components of the parametric model 
are randomly arranged in 6\±. Therefore, for all % G {0, 1} L x 
N such that £f +/ & = k, 

since two arrangements that differ only by the position of 
the points corresponding to the PPP give rise to the same 
allocation vector. 

The conditional distribution qrj(8 | z) reads 

q„(6\z) = Uqn(6j\zj), (5) 

7=1 

where 

[ l©f d ZJ =L+l. 
Therefore, from Equations (2) to (6), we have 

x n*?(l-*/) (1 ~ 6) (7) 

i=i 

where . . . , = "( z ) an d ^ is the set of all allocation 
vectors (i.e., the set of all z G ^kex{k} x -Sf* such that = 
«,(z) G {0,1}, for 1 <l <L). 

C. Estimating the model parameters 

We propose to fit the parametric distribution q^ to the 
posterior / of interest by minimizing a divergence measure 
from / to qrj. We use the KL divergence as a divergence 
measure in this paper, though other divergence measures can 
be used as well 6 . 

Denoting the KL divergence from / to q^ by 
DKL{f{B)\\qr\{6)), we define the criterion to be minimized 
as 

Sft) 4 D KL {f{8)\\q n {6)) = f /(fl)log^-de. 

Using samples generated by the RJ-MCMC sampler, this 
criterion can be approximated as 

i M 

jin) * Mn) = --Eiog^e^j+c, (so 

6 see [1, Chapter 2] where another divergence measure proposed by [34] 
has been used for this problem for robustness reasons. 
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At the (r+ l) th iteration, do: 

(S-step) Draw allocation vectors z^ l,r+1 \ 1 <i<M, 
using an IMH step with target #-( r ) (• | O^ 1 '). 

(E-step) Construct the pseudo-completed log-likelihood 

JuW) = -ieiiog(? I) (e ( ' ) ,z('>+ 1 ))). 

(M-step) Estimate such that 

7j( r+1 ) _ ar g m i n ^ ^ m (ij). 

Figure 3: Proposed SEM-type algorithm 

where C is a constant that does not depend on X] . One should 
note that minimizing ^ ' amounts to choosing 

M 

Tj = argmax^log^flW)]. (9) 

j= l 

To estimate the model parameters J] G N, one of the 
extensively used algorithms for Maximum Likelihood (ML) 
parameter estimation in latent variable models is the EM 
algorithm proposed by [35]. However, it turns out that the EM 
algorithm, which has been used in similar works [8, 20, 24], 
is not appropriate for solving this problem, as computing 
the expectation in the E-step is intricate. More explicitly, in 
our problem the computational burden of the summation in 
the E-step over the set of all possible allocation vectors z 
increases very rapidly with both L and k. In fact, even for 
moderate values of L and k, say, L = 15 and k — 10, the 
summation is far too expensive to compute as it involves 
iLoTIzibjr^ 1-3 10 10 terms. 

In this paper, we propose to use the SEM algorithm [29- 
31], a variation of the EM algorithm in which the E-step is 
substituted with stochastic simulation of the latent variables 
from their conditional posterior distributions given the pre- 
vious estimates of the unknown parameters. In other words, 
at the iteration r+1 of the SEM algorithm, denoting the 
estimated parameters at iteration r by f\ ^ , for ; = 1 , . . . , M, 
the allocation vectors zW are drawn from #-(,•)(• | 6^). This 
step is called the Stochastic (S)-step. Then, these random 
samples are used to construct the so-called pseudo-completed 
log-likelihood. 

Exact sampling from q^( r] (• | 9^), as required by the S-step 
of the SEM-type algorithm, is unfortunately not feasible — not 
even using the accept-reject algorithm, due to the heavily com- 
binatorial expression of the normalizing constant q^ r) {d^). 
Instead, since 

^ W (z«|0«) «* ^ w (0«,z») 

can be computed up to a normalizing constant, we choose 
to use an Independent Metropolis-Hasting (IMH) step with 
#„( r )(zW | flW) as its stationary distribution; see [1, Algo- 
rithm 2.2] for more details. The proposed SEM-type algorithm 
is summarized in Figure 3. 



Remark 1. It would also be possible to assign prior dis- 
tributions over the unknown parameters J] and study their 
posterior distributions (for example, using an MCMC sampler 
with the latent variable z added to the state of the chain, in the 
spirit of the "data augmentation" algorithm [36]). This would, 
however, leave the label-switching issue unsolved (because of 
the invariance of q^ to permutations of its components). 

Remark 2. Convergence results of the SEM algorithm in 
the general form are provided by [31] and, in the particular 
example of mixture analysis problems, by [37]. Unfortunately, 
the assumptions in [31, 37] do not hold in the problem we are 
dealing with as, 1) the observed samples 6^ are correlated, 
owing to the fact that they are generated from the true posterior 
distribution using some MCMC methods, e.g., the RJ-MCMC 
sampler; 2) an I-MH sampler is used to draw zW from 
the conditional posterior distribution. Nevertheless, empirical 
evidence of the "good" convergence properties of the SEM- 
type algorithm we proposed will be provided in the next two 
sections. 

D. Robustified algorithm 

Preliminary experiments with the SEM-type algorithm de- 
scribed in Figure 3 were not satisfactory, because the sample 
mean and (co)variance estimates in the M-step, obtained from 
minimizing the KL divergence from the posterior distribu- 
tion / to the parametric model q-q, still suffer from sensitivity 
to the outliers in the observed samples, even after including 
the Poisson point process component. As a workaround, 
we propose to use robust estimates [38] of the means and 
(co)variances of Gaussian distributions instead of the empirical 
means and (co)variances in the M-step. For example, in the 
case of univariate Gaussian distributions, one can use the 
median and the interquartile range as robust estimators of the 
mean and variance, respectively. See [1, Section 2.5] for more 
discussion of the robustness issue, including an alternative 
solution using the "robust divergence" of [34]. 

Remark 3. Similar robustness concerns are widespread in the 
clustering literature; see, e.g., [39] and references therein. 

III. Illustrative examples 

In this section, we will investigate the capability of VAPoRS 
for summarizing variable-dimensional posterior distributions 
using two signal decomposition examples; 1) joint Bayesian 
detection and estimation of sinusoids in white Gaussian 
noise [12] and 2) joint Bayesian detection and estimation of 
astrophysical particles in the Auger project [17, 18]; see [1, 
Chapters 3 and 4] for more results and discussion. We em- 
phasize again that the output of the trans-dimensional Monte 
Carlo sampler, e.g, the. RJ-MCMC sampler in this paper, is 
considered as the observed data for VAPoRS. 

A. Joint Bayesian detection and estimation of sinusoids in 
white Gaussian noise 

Let us consider the problem of detection and estimation 
of sinusoidal components introduced in Section I-B where 
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Gaussian Comp. #1 



Gaussian Comp. #2 



40 



-a 20 



40 60 
SEM iteration 

Figure 4: Evolution of the model parameters along with the crite- 
rion defined in (8) using 100 iterations of VAPoRS with L = 3 
on the RJ-MCMC output samples shown in Figure 1. 



the unknown parameters are the number k of components, 
the component-specific parameters (a c j,a s j,C0j), 1 < j < k, 
and the noise variance a 2 . Since the amplitudes and the 
noise variance can be analytically integrated out, we focus on 
summarizing the joint posterior distribution p(k, fi>i : £ | y) of 
the form illustrated in Figure 1. Therefore, we assume that 
the proposed parametric model introduced in Section II-A 
consists of univariate Gaussian components, with means fi[, 
variances s 2 , and probabilities of presence 7fy, 1 < / < L, 
to be estimated. Moreover, the space of component-specific 
parameters is © = (0,7r) C M.. 

Before launching VAPoRS, we need first to initialize the 
parametric model. It is natural to deduce the number L of 
Gaussian components from the posterior distribution of k. 
Here, we set it to the 90 th percentile of p(k\ y) to keep all the 
probable models in the play. To initialize the Gaussian compo- 
nents' parameters, i.e., jXi and sj, 1 < / < L, we used the robust 
estimates of the means and variances of the marginal posterior 
distributions of the sorted radial frequencies given k = L. 
Finally, we set % x = 0.9, for 1 < / < L, and A = 0.1. 

We ran the "robustified" stochastic algorithm introduced in 
Section II on the specific example shown in Figure 1, for 
100 iterations, with L = 3 Gaussian components (note that 
the posterior probability of {k < 3} is approximately 90.3%). 
To assess the convergence of VAPoRS, Figure 4 illustrates 
the evolution of the model parameters 17 together with the 
criterion ^ . Two substantial facts showing the convergence of 
VAPoRS can be deduced from this figure: first, the decreasing 
behavior of the criterion ^m, which is almost constant after 
the 10 th iteration; second, the convergence of the parameters 
of the parametric model, particularly the means \ii and proba- 
bilities of presence 7i[, 1 < / < L, even though we used a naive 
initialization procedure. Indeed after the 40 th iteration there is 
no significant move in the parameter estimates. 

As discussed in Section I, one of the main objectives of the 
algorithm we proposed is to solve the label-switching issue in 
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Figure 5: Histogram of the labeled samples, that is, the samples 
allocated to the Gaussian and Poisson point process components, 
versus the pdf's of estimated Gaussian components in the model 
(black solid line) using VAPoRS on the sinusoid detection example. 
The estimated parameters of each component are presented in the 
corresponding panel. 



a trans-dimensional setting. Figures 5 shows the histograms of 
the labeled samples, i.e., (0^,zW), with i= 1,...,M, along 
with the pdf's of the estimated Gaussian components (black 
solid line). Moreover, the summaries provided by VAPORS for 
each component are presented in its corresponding panel. We 
used the average of the last 50 SEM iterations as parameter 
estimates, as recommended in the SEM literature; see, for 
example, [30, 31]. Comparing the distributions of the labeled 
samples with the ones of the posterior distributions of the 
sorted radial frequencies given k = 3 shown in Figure 1, which 
are highly multimodal, reveals the capability of VAPoRS in 
solving label-switching in a variable-dimensional setting. 

Looking at the bottom right panel of Figure 5, the role 
of the point process component in capturing the outliers in 
the observed samples, which cannot be described by the 
Gaussian components, becomes clearer. Note that, without the 
point process component, these outliers would be allocated to 
the Gaussian components and would, consequently, induce a 
significant deterioration of the parameter estimates. 

Table I presents the summaries provided using VAPoRS 
along with the ones obtained using the BMS approach. Con- 
trary to the BMS approach, VAPoRS has enabled us to benefit 
from the information of all probable models to give summaries 
about the middle harder to detect component. Turning to the 
results of VAPoRS, it can be seen that the estimated means 
are compatible with the true radial frequencies. Furthermore, 
the estimated probabilities of presence are consistent with un- 
certainty of them in the variable-dimensional posterior shown 
in Figure 1. 

To observe better the "goodness-of-fit" of the estimated 
Gaussian components, the bottom panel of Figure 6 depicts 
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Table I: Summaries of the variable-dimensional posterior distribution 
shown in Figure 1; VAPoRS vs. the BMS approach. 
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Figure 6: Posterior distribution of the sorted radial frequencies fflj-j 
given k (top) and normalized pdf of the fitted Gaussian components 
(bottom). 



their normalized densities 7 , under the posterior distributions of 
the sorted radial frequencies given k. This figure can be used 
to validate the coherency of the estimated summaries with the 
information in the variable-dimensional posterior distribution. 
It can be seen from the figures that the shape of the pdf 's of 
the estimated Gaussian components are coherent in both the 
location and dispersion with the ones of the posterior of the 
sorted radial frequencies. 

It is also useful for validating the estimated summaries to 
compare the intensity of the estimated parametric model 
defined, in general, as 



h( n ) = J>-./r(-iMi,E/), 



(10) 



where we ignore the point process component, with the 
histogram intensity of all radial frequencies obtained using 
the BMA approach (see [1, Chapter 2] for more information). 
Figure 7 shows such a figure for the specific example of this 
section where the solid black line indicates the intensity of 
the estimated parametric model. These figures also indicate 
the "goodness-of-fit" of the fitted approximate posterior and 
the true one. 

Finally, to validate both the estimated probabilities of pres- 
ence of the Gaussian components and the mean parameter X of 
the Poisson point process component, Figure 8 illustrates the 
posterior distribution of the number k of components together 
with its approximated versions using VAPoRS. It can be seen 
from the figure that VAPoRS well captured the information 
provided in the true posterior of the number k of components. 



7 To obtain the normalized densities, first, we normalized the estimated pdf's 
to have their maximum equal to one. Then, we multiplied the estimated 
probability of presence of each Gaussian component to its corresponding 
normalized estimated pdf. Thus, the maximum of each normalized density 
is equal to the corresponding estimated probability of presence. 
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Figure 7: Histogram intensity of all radial frequencies samples using 
the BMA approach along with the intensity of the fitted parametric 
model obtained using VAPoRS. 
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Figure 8: Posterior distribution of the number k of number of 
components (black) and its approximated version (gray) obtained 
from the fitted model. 



B. Joint Bayesian detection and estimation of astrophysical 
particles in the Auger project 

As the second illustrative example, we show results on a 
signal decomposition problem encountered in the international 
astrophysics collaboration called Auger [17, 18]. The Auger 
project is aimed at studying ultra-high energy cosmic rays, 
with energies in order of 10 19 eV, the most energetic particles 
found so far in the universe. The long-term objective of this 
project is to study the nature of those ultra-high energy parti- 
cles and determine their origin in the universe. Nevertheless, 
they are not observed directly. In fact, when they collide the 
earth's atmosphere, a host of secondary particles are generated, 
some of which, mostly "muons", finally reach the ground. 
To detect them, the Pierre Auger Cosmic Ray Observatory 
was built which consists of two independent detectors; an 
array of Surface Detectors (SD) and a number of Fluorescence 
Detectors (FD). 

The number of muons and their arrival times can be used 
as indications of both the chemical composition and the origin 
of the primary particles (see [17, 18] for more information). 
Here, we concentrate on the signal decomposition problem, 
where the goal is to count the number of muons and estimate 
their individual parameters from the signals observed by SD 
detectors. To show results, we use the Bayesian algorithm 
and the RJ-MCMC sampler developed in [9, 40, 41] for the 
trans-dimensional problem of joint detection and estimation of 
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muons. In this section, we first briefly describe the problem 
and then use VAPoRS to relabel and summarize variable- 
dimensional output samples of the RJ-MCMC sampler devel- 
oped by [9, 40, 41]. 

When a muon crosses a SD tank, it generates photoelectrons 
(PE's) along its track that are, then, captured by detectors and 
create a discrete observed signal. We denote the vector of 
observed signal by n = (m , . . . , «;v) G N N , where the element 
indicates the number of PE's deposited by the muons in the 
time interval 

[ti-\,ti) = [fo + (i-l)*A, to + itA), 

where to is the absolute starting time of the signal and t& = 
25 ns is the signal resolution (length of one bin). 

Each muon has two component-specific parameters, namely, 
the arrival time and the signal amplitude a M . The absorption 
process of the photons generated by a muon is modeled by a 
non-homogeneous Poisson point process with intensity [41, 
Section 2.2] 



(11) 



where p x ,t d {t) is the time response distribution, tj is the rise- 
time and T is the exponential decay (both measured in ns); 
see Figure 9 (bottom) for such exponential shape intensities. 
Then, the expected number of PE's in the bin i is obtained by 
integrating the intensity (11) in the corresponding bin: 



(12) 



20 

pq 

* 10 
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Figure 9: (top) Observed signal n. (bottom) Intensity of the model 
hit | flju, tfi) defined in (1 1). There are k = 5 muons in the signal with 
the true arrival times, i.e., = (105, 169, 267, 268, 498), indicated 
by vertical dashed lines. 



Conditioning on the number k of muons and the vector of 
parameters f M = (t^i, . . . ,t^ k ) and Ou = {a^i,...,a^ k ), and 
assuming that the number of PE's in each bin are independent, 
the likelihood is written as 

N 

p(n\k,tn,an) =Ylp{n i \n i (k,an,t ll )), (13) 

i=i 

where p(rn\fii(k,ai 1 , t^)) is a Poisson distribution with the 
mean fii(k,a^,t^). Then, assuming independence of the 
muons, the expected number of PE's in the /* bin, i.e., 
given k, t^, and becomes 




0.25 0.5 100 200 300 400 500 
p{k\n) t[ns] 

Figure 10: Posterior distributions of the number k of muons (left) and 
the sorted arrival times, f M , given k (right) constructed using 60000 
RJ-MCMC output samples after discarding the burn-in period. The 
true number of components is five. The vertical dashed lines in the 
right figure locate the arrival times. 



(14) 



We will now illustrate the performance of VAPoRS on a 
simulated PE counting signal (see [1, Chapter 4] for results 
on two other simulated experiments). The observed signal of 
the illustrative example considered here consists of five muons 
located at f M = (105, 169,267,268,498) (see Figure 9). The 
posterior distributions of the number k of muons and sorted 
arrival times are shown in Figure 10. Note that, in this 
example, there are two muons with almost equal arrival times, 
i.e., the third and fourth muons. 

Using the BMS approach, the model with four muons 
would be selected (p(k — 4\n) =0.4), although ^5 has an 
almost similar posterior probability of 0.38. Moreover, observe 
that the marginal posterior of the arrival time of the third 
component is bimodal under both ^4 and, more significantly 



so, ^#5. We ran VAPoRS with L = 6 Gaussian components 
on the RJ-MCMC output samples shown in Figure 10 (note 
that p(k<6\n) =0.94). 

Figure 11 shows the histogram of the labeled samples 
and the estimated parameters of the components. From the 
figure, it can be seen that the bimodality effects caused by 
label-switching exhibited in Figure 10 is removed completely 
and the estimated Gaussian components enjoy reasonable 
variances. In the presented summary, there are four muons 
with high probabilities of presence corresponding to the ones 
shown in the bottom row of Figure 10. There are also two other 
muons with comparatively low probabilities of presence. 

In fact, the samples allocated to the point process component 
shown the bottom row of Figure 1 1 can be regarded as the 
residuals of the fitted model, that is, the observed samples 
which the L Gaussian components in have not been able 
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Figure 12: Histograms of the residuals of the fitted model using 
VAPoRS with different values of L= {3,4,6,8}. 




Figure 11: Histogram of the labeled samples along with the pdf's of 
estimated Gaussian components in the model (black solid line) using 
VAPoRS with L = 6 on the variable-dimensional postrior shown in 
Figure 10. The estimated parameters of each component are presented 
in the corresponding panel. 
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Figure 13: Normalized pdf's of the fitted Gaussian components using 
VAPoRS with different values of 6 < L < 9. 



to describe. These residuals can be used, as usual in statistics, 
as a tool for goodness-of-fit diagnostics and model choice. 

Figure 12 illustrates the histograms of the residuals of the 
fitted model for different values of L G {3, 4, 6, 8}. It can be 
seen from the top left panel of Figure 12 that the distribution 
of the residuals corresponding to the case where L = 3 contains 
a few "significant" peaks. The peaks are gradually removed 
by adding Gaussian components. When L = 4, a component 
is added at =261 that captures samples distributed around 
the most significant peak of the top left panel of Figure 12. 
However, there still exist a few peaks, particularly around = 
173 which are captured when L > 6. However, the distribution 
of residuals for the case of L = 6 and L = 8 do not differ 
significantly. Note the decrease of value of X by increasing L. 

Figure 13 compares the normalized intensities of the esti- 
mated Gaussian components for 6 < L < 9. It can be seen from 
the figure that changing L in a reasonable range, say, 6 < L < 9, 
does not influence significantly the final inference. In all cases, 
the six Gaussian components that were estimated in the case 
of L = 6 exist. By moving from L = 6 to L = 9, additional 
Gaussian components are added in the obtained summary with 
very low probabilities of presence, which improve the fit but 
does not change much the final inference. 



IV. Monte Carlo experiment 

The examples of Section III have illustrated the capability 
of VAPoRS to relabel and summarize variable-dimensional 
posterior distributions encountered in two signal decompo- 
sition problems. In order to confirm these findings, we will 
now investigate more systematically, by means of a Monte 
Carlo simulation experiment, how faithfully the approximate 
posterior distribution preserves certain features of the true 
posterior distribution. 

One hundred realizations of the sinusoid detection experi- 
ment described in Section I-B (see Figure 1) were simulated 
and analyzed using the same RJ-MCMC sampler as before. 
The number of RJ-MCMC iterations was set to 100000 and 
the first 20 000 samples were discarded as the burn-in period. 
Then, the samples were thinned to one every fifth. To initialize 
the parametric model in a systematic fashion, we set L to 
the largest k such that its posterior probability is not less than 
0.05. Then, during the process of the SEM-type algorithms, 
if sufficient number of samples, say, 10, is not allocated 
to a Gaussian component (or, equivalently, its probability 
of presence fades to zero), we will remove it from the 
parametric model and decrease L by one. Using this approach 
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generally results in approximate posterior distributions which 
are "richer" than those provided by the BMS approach 8 , in 
the sense that L > k MAP , where k MA? = argmax^. p(k\y). To 
initialize the Gaussian components' parameters, i.e., the means 
Hi and variances sf, we used as previously robust estimates of 
the mean and variances of the posterior distributions of sorted 
radial frequencies given k~L. 

Figure 14 compares various features of the fitted approxi- 
mate posterior distribution q^, obtained using 100 iterations 
of VAPoRS, with the corresponding features of the true 
variable-dimensional posterior distribution. These features are 
described in the rest of this section. 

The scatter plots shown in panels (a), (b), and (c) compare 
the posterior distribution of the number k of components, i.e., 
p(k\y), with its approximated version, denoted here by p(k\y), 
in 100 runs. We only show the posterior probabilities of k = 2 
and k = 3 in this comparison, as the other probabilities were 
close to zero. The digits situated on the right of the points 
in the panel (a) indicate the number of occurrence of the 
corresponding event in 100 runs and k MAP = argmax /t /5(fc|y). 
It can be seen from these three panels that the information 
in p(k\y) was well preserved by the approximated posterior 
distributions. 

Next we compare the performance of VAPoRS with the one 
of the "direct" BMA approach 9 in reconstructing the noiseless 
signal yo = D ai±. To this end, the estimated reconstructed 
noiseless signal is defined as 

yo = E(y |y) 

= £ / E(yo|*,0i*,yMMi*|y)d*i*- (is) 

In the direct BMA approach, using the samples generated with 
the RJ-MCMC sampler, the above integral is approximated by 

1 M 

~BMA _ 1 y- n (/) *(<) 

where DW is the design matrix of the i th vector of the sampled 
radial frequencies fi>^ (;) and fij'.^j) is the posterior mean 

of the amplitudes given Oj^j and its hyperparameters. To 
reconstruct the noiseless signal from the fitted approximate 
posterior using VAPoRS, one can generate R pairs of 

samples (fcM , ®^L r ) ) as explained in Section II-A. Then, we 
set 

Panel (d) compares the normalized reconstruction errors 
when using VAPORS with the ones of the direct BMA 
approach in dB, defined as 

101ogl0 V^^J' (16) 

8 Later, in a post-processing step, since each Gaussian component has been 
endowed with a probability of presence 7Zi, with 1 < / < L, one can decide to 
discard the ones with m smaller than a certain threshold; see [1, Section 3.4.3] 
for more discussion about this idea. 

9 By "direct", we mean that posterior means are approximated using the 
RJ-MCMC samples directly, and not using the VAPoRS posterior. 



where || • || is the L2-norm and we set yo = f® MA and yo = 
y /APoRS , when using the BMA approach and VAPoRS, re- 
spectively. It can be seen from the figure that the normalized 
errors of the reconstructed noiseless signals using the compact 
summary obtained by VAPoRS are quite comparable with the 
ones obtained using the BMA approach. 

Finally, the scatter plots in the last two panels compare 
the expected number of components in the intervals (0,7r/4) 
and (n/4,n/2) using VAPoRS with, again, the ones obtained 
using the direct BMA approach. For the BMA approach, the 
expected number of components in an interval T C (0;n) is 
given by 

i M 

E(JV(r)|y) = ^E(N(T)\k,y)p(k\y) « t, L^OO , 

keje m !=i 

where N^(T) is the number of radial frequencies observed 
in T on the i th sample. On the other hand, from the summary 
provided by VAPoRS, the expected number of components in 
interval T is 




where JY{T;T\{) denotes the probability of T under the Gaus- 
sian distribution with parameters TJ ; . The figures confirm that 
the expected number of components in the chosen intervals 
computed using both approaches are very similar. 

The results shown in this section confirmed that the ap- 
proximate posterior distribution q^ obtained using VAPoRS 
preserves faithfully several important features of the true 
posterior distribution; see [1, Section 3.4] for more results in 
this vein, including a numerical investigation comparison of 
the properties of estimators derived from VAPoRS. 

V. Conclusion 

In this paper, we have proposed a novel algorithm to 
relabel and summarize variable dimensional posterior distri- 
butions encountered in signal decomposition problems when 
the number of component is unknown. For this purpose, a 
variable-dimensional parametric model has been designed to 
approximate the posterior of interest. The parameters of the 
approximate model have been estimated by means of an 
SEM-type algorithm, using samples from the true posterior 
distribution / generated by a trans-dimensional Monte Carlo 
sampler, e.g., the RJ-MCMC sampler. Modifications of our 
initial SEM-type algorithm have been proposed, in order to 
cope with the lack of robustness of maximum likelihood-type 
estimates. 

The relevance of the proposed algorithm, both for sum- 
marizing and for relabeling variable-dimensional posterior 
distributions, has been illustrated on two signal decomposition 
examples, namely, the problem of detection and estimation 
of sinusoids in Gaussian white noise and a particle counting 
problem motivated by the astrophysics project Auger. Most 
notably, VAPoRS has been shown to be the first approach in 
the literature capable of solving the label-switching issue in 
trans-dimensional problems. We have shown that the proposed 
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Figure 14: Comparison of (some features of) the true posterior distribution with its VAPoRS approximation. 
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parametric model provides a good approximation for the pos- 
teriors encountered in both applications. Moreover, VAPoRS 
can provide the user with more insight concerning not only 
the component-specific parameters but also the uncertainties 
about their presence. 

We believe that this algorithm can be useful in the vast 
domain of signal decomposition and mixture model analysis to 
enhance inference in trans-dimensional problems. Theoretical 
investigations are required in order to extend available existing 
convergence results for the SEM algorithm to the SEM-type 
algorithm used in this paper (with correlated input data and 
Metropolis-Hastings updates). Future work will focus on using 
VAPoRS to design more efficient adaptive trans-dimensional 
MCMC methods, as a continuation of the ideas presented in [8, 
9]. 
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